Journal of Beijing University of Posts and Telecommunications

  • EI核心期刊

JOURNAL OF BEIJING UNIVERSITY OF POSTS AND TELECOM ›› 2006, Vol. 29 ›› Issue (2): 18-21.doi: 10.13190/jbupt.200602.18.lül

• Papers • Previous Articles     Next Articles

Realizing English Text Classification with Semantic Set Index Method

Lv Lin,LIU Yushu,LIU Yan   

  1. School of Management and Economics, Beijing Institute of Technology
  • Online:2006-04-28 Published:2006-04-28

Abstract: To overcome the limitations of actual text classification methods based on bag-of-words representation, An English text classification method based on semantic set index is presented from the WordNet thesaurus and LSI (latent semantic indexing) model. At the initial stages of text classification, the method first constructs semantic thesaurus database by WordNet and replaces bag-of-words with bag-of-semantic sets as an element of the text feature vector. Then LSI model will be used to further mine the deep-seated relations among concepts represented by semantic sets. It effectively incorporates linguistic knowledge and conceptual index into text vector space representation. The experimental results aiming at Na-ve Bayes and simple vector distance text classification methods show that the accuracy rates of the two classification methods are gradually improved along with more and more in-depth semantic analysis, fully indicating that semantic mining is very important and necessary to text classification.

Key words: text classification, semantic set index, latent semantic indexing